Deep Salience: Visual Salience Modeling via Deep Belief Propagation
نویسندگان
چکیده
Visual salience is an intriguing phenomenon observed in biological neural systems. Numerous attempts have been made to model visual salience mathematically using various feature contrasts, either locally or globally. However, these algorithmic models tend to ignore the problem’s biological solutions, in which visual salience appears to arise during the propagation of visual stimuli along the visual cortex. In this paper, inspired by the conjecture that salience arises from deep propagation along the visual cortex, we present a Deep Salience model where a multi-layer model based on successive Markov random fields (sMRF) is proposed to analyze the input image successively through its deep belief propagation. As a result, the foreground object can be automatically separated from the background in a fully unsupervised way. Experimental evaluation on the benchmark dataset validated that our Deep Salience model can consistently outperform eleven state-of-the-art salience models, yielding the higher rates in the precision-recall tests and attaining the best F-measure and mean-square error in the experiments. Introduction Automated detection of visual objects in images and videos is a subject of primary interest because of its wide application in image/video indexing, content-aware editing, medical image analysis, intelligent computer-human interface, robotic vision, and visual surveillance. Researchers in artificial intelligence and computer vision have successfully developed a number of methods for object detection, such as AdaBoost face detection (Viola and Jones 2001), SVM-based human detection (Vedalid et al 2009; Dalal and Triggs 2005), and min-cut object segmentation (Rother et al 2004). These approaches usually depend on training on predefined datasets, or on user input such as scribbles or trimaps. However, when no prior knowledge of image content is available, unsupervised object detection is a hard problem, and it has attracted considerable interest from the research community. The past decade has seen consistent progress towards unsupervised image segmentation and object detection. A widely-adopted approach is to consider an image principally as a set of hierarchical contours (Arbelaez et al 2011). This assumes that semantic content and the objects usually correspond to specific image structures. Recent research (Farabet et al 2013; Kohli et al 2013) has also suggested that this hierarchical view of image content may be correlated to the deep learning of image structures (Hinton et al 2006). In a development of this approach, we propose in this paper a Deep Salience model based on successive Markov Random Fields (sMRF) for unsupervised object detection. Our work is conceptually related to the recent pioneering work on hierarchical image analysis (Farabet et al 2013; Kohli et al 2013). Unsupervised object detection usually leads to the topic of visual salience, which stems from psychological research on biological visual perception (Koch and Ullman 1985; Itti and Koch 2001). The earliest bio-inspired computational salience model was proposed by Koch and Ullman (1985), where the contrast between visual stimuli (pixels) was considered as the origin of salience awareness. A number of publications (Itti et al 1998; Ma and Zhang 2003; Harel et al 2006; Hou and Zhang 2007; Judd 2009) have followed this roadmap to develop their salience models using a variety of features. These methods are usually based on local contrast and tend to produce higher salience values near edges instead of uniformly highlighting salient objects. Cheng et al (2011) categorized these approaches as local approaches. Recent efforts have been made towards using global contrasts, where pixels or regions were evaluated with respect to the entire image. Achanta et al (2009) proposed a frequency tuned method that defines pixel salience using region-averaged contrast. Goferman et al (2012) used block-based global contrast. Cheng et al (2011) extend Achanta’s method to region-based salience estimation. Perazzi et al (2012) further extend this regioncontrast approach by leveraging superpixels. Jiang and Crookes (2012) used mutual information (MI) evaluation with a center-surround a priori map for global salience estimation. Peng et al (2013) introduced the low-rank matrix computation for salience modeling. In summary, both local and global methods have been based on modeling salience using various visual contrast definitions with various features based on pixels, blocks or regions. From a biological viewpoint, we consider the conjecture that human visual salience is a consequence of the deep Copyright © 2014, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence
منابع مشابه
Salience Theory and Pricing Stock of Corporates in Tehran Stock Exchange
How the investors react to the received information plays a crucial role in determining the return of stock exchange market. Supply and demand based upon incorrect decisions lead to the price deviation of inherent values. This paper aims to study the impact of salience phenomenon on disproportionate pricing and investor overreaction in the corporates in Tehran stock exchange. Research methodolo...
متن کاملSalience Estimation via Variational Auto-Encoders for Multi-Document Summarization
We propose a new unsupervised sentence salience framework for Multi-Document Summarization (MDS), which can be divided into two components: latent semantic modeling and salience estimation. For latent semantic modeling, a neural generative model called Variational Auto-Encoders (VAEs) is employed to describe the observed sentences and the corresponding latent semantic representations. Neural va...
متن کاملBeyond saliency: understanding convolutional neural networks from saliency prediction on layer-wise relevance propagation
Despite the tremendous achievements of deep convolutional neural networks (CNNs) in most of computer vision tasks, understanding how they actually work remains a significant challenge. In this paper, we propose a novel two-step visualization method that aims to shed light on how deep CNNs recognize images and the objects therein. We start out with a layer-wise relevance propagation (LRP) step w...
متن کاملDeep Salience Representations for F0 Estimation in Polyphonic Music
Estimating fundamental frequencies in polyphonic music remains a notoriously difficult task in Music Information Retrieval. While other tasks, such as beat tracking and chord recognition have seen improvement with the application of deep learning models, little work has been done to apply deep learning methods to fundamental frequency related tasks including multi-f0 and melody tracking, primar...
متن کاملFast and Conspicuous? Quantifying Salience With the Theory of Visual Attention
Particular differences between an object and its surrounding cause salience, guide attention, and improve performance in various tasks. While much research has been dedicated to identifying which feature dimensions contribute to salience, much less regard has been paid to the quantitative strength of the salience caused by feature differences. Only a few studies systematically related salience ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014